Resources for Multilingual Text Generation in Three Slavic Languages

نویسندگان

  • John A. Bateman
  • Elke Teich
  • Geert-Jan M. Kruijff
  • Ivana Kruijff-Korbayová
  • Serge Sharoff
  • Hana Skoumalová
چکیده

The paper discusses the methods followed to re-use a large-scale, broad-coverage English grammar for constructing similar scale grammars for Bulgarian, Czech and Russian for the fast prototyping of a multilingual generation system. We present (1) the theoretical and methodological basis for resource sharing across languages, (2) the use of a corpus-based contrastive register analysis, in particular, contrastive analysis of mood and agency. Because the study concerns reuse of the grammar of a language that is typologically quite different from the languages treated, the issues addressed in this paper appear relevant to a wider range of researchers in need of largescale grammars for less-researched languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilinguality in a Text Generation System For Three Slavic Languages

This paper describes a multilingual text generation system in the domain of CAD/CAM software instructions for Bulgarian, Czech and Russian. Starting from a language-independent semantic representation, the system drafts natural, continuous text as typically found in software manuals. The core modules for strategic and tactical generation are implemented using the KPML platform for linguistic re...

متن کامل

The MULTEXT-East Morphosyntactic Specifications for Slavic Languages

Word-level morphosyntactic descriptions, such as “Ncmsn” designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few attempts to arrive at a proposal that would be harmonised across the languages. Standardisation adds to the interchange potential of the resources, making it easier to develop multilingual applications or t...

متن کامل

The MULTEXT-East Morphosyntactic Specification for Slavic Languages

Word-level morphosyntactic descriptions, such as “Ncmsn” designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few attempts to arrive at a proposal that would be harmonised across the languages. Standardisation adds to the interchange potential of the resources, making it easier to develop multilingual applications or t...

متن کامل

Handling Word Order in a Multilingual System for Generation of Instructions

Slavic languages are characteristic by their relatively high degree of word order freedom. In the process of automatic generation from an underlying representation of the content, we have to ensure that a semantically and contextually appropriate word order is chosen. In this paper, we elucidate information structure as the main factor determining word order in Slavic languages, and we present ...

متن کامل

Multilingual Lexical Database Generation from Parallel Texts in 20 European Languages with Endogenous Resources

This paper deals with multilingual database generation from parallel corpora. The idea is to contribute to the enrichment of lexical databases for languages with few linguistic resources. Our approach is endogenous: it relies on the raw texts only, it does not require external linguistic resources such as stemmers or taggers. The system produces alignments for the 20 European languages of the ‘...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000